225 research outputs found

    Robust Group Linkage

    Full text link
    We study the problem of group linkage: linking records that refer to entities in the same group. Applications for group linkage include finding businesses in the same chain, finding conference attendees from the same affiliation, finding players from the same team, etc. Group linkage faces challenges not present for traditional record linkage. First, although different members in the same group can share some similar global values of an attribute, they represent different entities so can also have distinct local values for the same or different attributes, requiring a high tolerance for value diversity. Second, groups can be huge (with tens of thousands of records), requiring high scalability even after using good blocking strategies. We present a two-stage algorithm: the first stage identifies cores containing records that are very likely to belong to the same group, while being robust to possible erroneous values; the second stage collects strong evidence from the cores and leverages it for merging more records into the same group, while being tolerant to differences in local values of an attribute. Experimental results show the high effectiveness and efficiency of our algorithm on various real-world data sets

    Vehicular Fog Computing Enabled Real-time Collision Warning via Trajectory Calibration

    Full text link
    Vehicular fog computing (VFC) has been envisioned as a promising paradigm for enabling a variety of emerging intelligent transportation systems (ITS). However, due to inevitable as well as non-negligible issues in wireless communication, including transmission latency and packet loss, it is still challenging in implementing safety-critical applications, such as real-time collision warning in vehicular networks. In this paper, we present a vehicular fog computing architecture, aiming at supporting effective and real-time collision warning by offloading computation and communication overheads to distributed fog nodes. With the system architecture, we further propose a trajectory calibration based collision warning (TCCW) algorithm along with tailored communication protocols. Specifically, an application-layer vehicular-to-infrastructure (V2I) communication delay is fitted by the Stable distribution with real-world field testing data. Then, a packet loss detection mechanism is designed. Finally, TCCW calibrates real-time vehicle trajectories based on received vehicle status including GPS coordinates, velocity, acceleration, heading direction, as well as the estimation of communication delay and the detection of packet loss. For performance evaluation, we build the simulation model and implement conventional solutions including cloud-based warning and fog-based warning without calibration for comparison. Real-vehicle trajectories are extracted as the input, and the simulation results demonstrate that the effectiveness of TCCW in terms of the highest precision and recall in a wide range of scenarios

    Stability and bifurcation analysis of Westwood+ TCP congestion control model in mobile cloud computing networks

    Get PDF
    In this paper, we first build up a Westwood+ TCP congestion control model with communication delay in mobile cloud computing networks. We then study the dynamics of this model by analyzing the distribution ranges of eigenvalues of its characteristic equation. Taking communication delay as the bifurcation parameter, we derive the linear stability criteria depending on communication delay. Furthermore, we study the direction of Hopf bifurcation as well as the stability of periodic solution for the Westwood+ TCP congestion control model with communication delay. We find that the Hopf bifurcation occurs when the communication delay passes a sequence of critical values. The stability and direction of the Hopf bifurcation are determined by the normal form theory and the center manifold theorem. Finally, numerical simulation is done to verify the theoretical results

    Erasing-based lossless compression method for streaming floating-point time series

    Full text link
    There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e., Elf. The main idea of Elf is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis, Elf can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros. Furthermore, observing the values in a time series usually have similar significand counts, we propose an upgraded version of Elf named Elf+ by optimizing the significand count encoding strategy, which improves the compression ratio and reduces the running time further. Both Elf and Elf+ work in a streaming fashion. They take only O(N) (where N is the length of a time series) in time and O(1) in space, and achieve a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of Elf and Elf+ compared with 9 advanced competitors for both double-precision and single-precision floating-point values

    Visible Red and Infrared Light Alters Gene Expression in Human Marrow Stromal Fibroblast Cells

    Get PDF
    Objectives This study tested whether or not gene expression in human marrow stromal fibroblast (MSF) cells depends on light wavelength and energy density. Material and Methods Primary cultures of isolated human bone marrow stem cells (hBMSC) were exposed to visible red (VR, 633 nm) and infrared (IR, 830) radiation wavelengths from a light emitting diode (LED) over a range of energy densities (0.5, 1.0, 1.5, 2.0 Joules/cm2) Cultured cells were assayed for cell proliferation, osteogenic potential, adipogenesis, mRNA and protein content. mRNA was analyzed by microarray, and compared among different wavelengths and energy densities. Mesenchymal and epithelial cell responses were compared to determine whether responses were cell-type specific. Protein array analysis was used to further analyze key pathways identified by microarrays. Result Different wavelengths and energy densities produced unique sets of genes identified by microarray analysis. Pathway analysis pointed to TGF beta 1 in the visible red and Akt 1 in the infrared wavelengths as key pathways to study. TGF beta protein arrays suggested switching from canonical to non-canonical TGF beta pathways with increases to longer IR wavelengths. Microarrays suggest RANKL and TIMP 10 followed IR energy density dose response curves. Epithelial and mesenchymal cells respond differently to stimulation by light suggesting cell-type specific response is possible. Conclusions These studies demonstrate differential gene expression with different wavelengths, energy densities and cell types. These differences in gene expression have the potential to be exploited for therapeutic purposes and can help explain contradictory results in the literature when wavelengths, energy densities and cell types differ

    Assessing and Enhancing Robustness of Deep Learning Models with Corruption Emulation in Digital Pathology

    Full text link
    Deep learning in digital pathology brings intelligence and automation as substantial enhancements to pathological analysis, the gold standard of clinical diagnosis. However, multiple steps from tissue preparation to slide imaging introduce various image corruptions, making it difficult for deep neural network (DNN) models to achieve stable diagnostic results for clinical use. In order to assess and further enhance the robustness of the models, we analyze the physical causes of the full-stack corruptions throughout the pathological life-cycle and propose an Omni-Corruption Emulation (OmniCE) method to reproduce 21 types of corruptions quantified with 5-level severity. We then construct three OmniCE-corrupted benchmark datasets at both patch level and slide level and assess the robustness of popular DNNs in classification and segmentation tasks. Further, we explore to use the OmniCE-corrupted datasets as augmentation data for training and experiments to verify that the generalization ability of the models has been significantly enhanced
    corecore